from IPython.display import Image
Image("algar-logo.png", width=600)
Algar-Kamaji is a currency pair monitor that watches the market main financial signals and indicators, to present to the user a way to track tendences of the market concurrently with their international currency invoice list. In this way, the system acts as a support dashboard that brings insights into a daily routine financial operation of any size.
None user confidential information is stored on the system, neither any operation essential data, this system should not be used as a main storage to any critical information, where the only user information that is used is the invoice expiring date.
The system is modular and does not need to inherit a past usage data, you can invoke it in the same way as you did for the first time
Image('kamaji-main.png')
Python2.7
Python3.6
pip
git
postgresql (postgresql://postgres:postgres@localhost/postgres)
git clone https://github.com/sudoferraz/algar-kamaji
cd algar-kamaji/
pip install -r requirements.txt
With PostgreSQL service active
cd algar-kamaji/
gunicorn -b 0.0.0.0:5000 apiserver:app
python interface.py
You can open a local browser and access localhost:5000/test, if it answers, you are live and good to go!
. Gunicorn - Listens on port 5000 for HTTP requests from the local network. Gunicorn is flexible and attach easily on the Flask web App from its call on the cli interface. Gunicorn receives the request from the LAN, translates it to a Web Server Gateway Interface compatible request and calls the Flask request handler method.
. Flask - Used as a web framework that calls the business logic from a modular controller and returns an HTTP response to be used on. Flask make network calls to the database and other external services concurrently using non-blocking I/O so that the application can execute the business logic of one request while waiting on the socket for other requests.
. SQLAlchemy - SQLAlchemy is an open source SQL toolkit and object-relational mapper (ORM) that connects and define the structure of the database through python objects. Helper methods are inherited by the SQLAlchemy sub classes to abstract the SQL scripts usage.
. Python2.7 - The business logic and access from user actions on the database are encapsulated by a middleware that validate and returns formatted HTTP answers.
. PostgreSQL2.0 - SQL DataBase
Image('kamaji-architecture.png')
Algar-Kamaji only connects to the internet to grab the latest market data, and cannot be accessed from outside the same local area network that is being hosted at. It connects to the yahoo api to grab these dataframes several times on each user interaction, and iterates itself in a 30 seconds loop if there is no user detected. This API is used via the pandas web data reader that encapsulates a http request on yahoo financial API port 80.
The response is parsed directly in the middleware, namely the 'interface.py' that can then serve the processed data to the other modules, but is important to say that none persistent objects are shared between these.
Every layer can then do specific business logic with the help of the module "auxiliary.py" that is described later on this document.
import pandas as pd
import pandas_datareader.data as web
df = pd.read_csv('brlusd.csv')
display(df.head(5))
from stockstats import StockDataFrame
df.to_csv('brlusd.csv', mode='w', header=True)
data = StockDataFrame.retype(pd.read_csv('brlusd.csv'))
macdh = data['macdh']
print macdh[-1]
Image("erd_from_sqlalchemy.png")
Table created for logins and actions taken, the password is stored as a hash. The web Api also directly sends a hash via url for login, therefore, never handling the plain text password
Information of users that want to be notified on specific tendence reversals and payment sugestions that are triggered by the financial signals.
Information about the forecast generated on the machine learning algorithm dependent on the number of days for calculation of the market movement direction. It can also track an invoice to calculate the difference between today and the invoice end-date for creating an specific forecast prediction.
A log with the recent notifications triggered by the system, and the information on what the notification was, and the platform that it has used(email, cellphone).
Indicator derived numbers that represent characteristics of the market movement at any given time
Table used for logging user actions that occur inside the system.
Reference that contain triggers on which signals are being used by the system for user notifications on possible events and market movement changes
Information on the user chosen invoices, where the only relevant data for the system is the end date, namely dt_vencimento, and can also contain other user chosen notes on each of these for better tracking and user experience. The status attribute refers to the invoice payment status so it can include it or not in new predictions.
Table that stores the latest market movement direction, namely bearish or bullish dependent on the MACD signal.
A specific user notification that is triggered when a bearish to bullish tendence reversal is detected and there is an opened invoice with end-date cointained within the tendence duration.
The latest market indicators(equations to derive characteristics from the market price) from the close value.
Image("pgadmin.png")
Image("classes_Auxiliary.png")
Are the business logic that all other module and layers of the system can use. In this approach we guarantee a modular system that can be further developed for new features. We can also assure that all return objects are treated and error free, formatted and ready for an HTTP response that can be called by a middleware.
Can then be in charge of specific routines an relations between these auxiliary objects and functions, in this example, we show how a middleware module can access a market value from the database and create a signal based on its value
import auxiliary
os_tools = auxiliary.ostools()
session = os_tools.db_connection()
indicator_handler = auxiliary.indicator_handler()
close = data['close'][-1]
bollinger_lb = indicator_handler.get_indicator_by_name(session, 'bollinger_low')
print close
print bollinger_lb.value
macdh_standard_deviation = macdh.std()
macdh_mean = macdh.mean()
macdh_distance = macdh[-1] - macdh_mean
macdh_standardized = macdh_distance / macdh_standard_deviation
print "Latest Macd Histogram : " + str(macdh[-1])
print "Macd Histogram Standard deviation : " + str(macdh_standard_deviation)
print "Macd Histogram Mean : " + str(macdh_mean)
print "Macd Histogram Standardized : " + str(macdh_standardized)
signal_handler = auxiliary.signal_handler()
macdh_indicator = indicator_handler.get_indicator_by_name(session, 'macd_histogram')
# 0.2 = Peso definido na estratégia, default 1/3 STD
if macdh[-1] > 0 and macdh_standardized > 0.33:
macdh_signal = signal_handler.create_signal(session, macdh_indicator.id, macdh[-1])
else:
macdh_signal = False
Image("close_price.png")
Image("macd_histogram.png")
Image("bollinger_bands.png")
Research related to find the best format to represent financial time series data with certain data analysis for the usage of machine learning techniques
%matplotlib inline
import pandas as pd
import pandas_datareader as web
from IPython.core.display import display
import matplotlib.pylab as plt
from stockstats import StockDataFrame
import seaborn as sns
sns.set()
df = web.DataReader('BRL=X', 'yahoo')
data = pd.DataFrame(df)
data = StockDataFrame.retype(data)
display(data.head())
data.plot(figsize=(15,10))
%matplotlib inline
import pandas as pd
import pandas_datareader as web
from IPython.core.display import display
import matplotlib.pylab as plt
from stockstats import StockDataFrame
import seaborn as sns
sns.set()
data = pd.read_csv('USDBRL/all_indicators.csv')
data = StockDataFrame.retype(data)
copy = data.copy()
display(data.tail())
#How much of the data is missing
counter_nan = data.isnull().sum().sort_values(ascending=False)
plt.figure(figsize=(15,10))
plt.scatter(counter_nan, counter_nan.values)
plt.show()
#how many columns does not have a single nan
counter_without_nan = counter_nan[counter_nan==0]
print " [+] Number of columns that does not have a nan: " + str(len(counter_without_nan))
print " [+] Number of total columns: " + str(len(data.columns))
display(data[counter_nan.keys()].head())
from pandas.util.testing import assert_series_equal
import numpy as np
# Taking out columns that have all values as 0 or equal values
data = StockDataFrame.retype(data)
cols = data.select_dtypes([np.number]).columns
diff = data[cols].diff().sum()
data = data.drop(diff[diff==0].index, axis=1)
data = data.drop('adj close', 1)
display(data.tail())
data = data[14:-14]
counter_nan = data.isnull().sum().sort_values(ascending=False)
display(data[counter_nan.keys()].head())
plt.figure(figsize=(15,10))
plt.scatter(counter_nan, counter_nan.values)
plt.show()
print " [+] Number of columns that does not have a nan: " + str(len(counter_nan))
print " [+] Number of total columns: " + str(len(data.columns))
#Back filling for holidays and exceptional days on the market
data = data.fillna(method='bfill')
data = data[1:-1]
counter_without_nan = data.isnull().sum().sort_values(ascending=False)
print " [+] Number of columns that does not have a nan: " + str(len(counter_without_nan))
print " [+] Number of total columns: " + str(len(data.columns))
def plot_histogram(x):
plt.figure(figsize=(15,10))
plt.hist(x, alpha=0.5)
plt.title("Histogram of '{var_name}'".format(var_name=x.name))
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
plot_histogram(data['macdh'])
plot_histogram(data['cci'])
import matplotlib.mlab as mlab
mu = data['close_-1_r'].mean()
sigma = data['close_-1_r'].std()
x = data['close_-1_r']
num_bins = 50
fig, ax = plt.subplots(figsize=(15,10))
n, bins, patches = ax.hist(x, num_bins, normed=1)
y = mlab.normpdf(bins, mu, sigma)
ax.plot(bins, y, '--')
ax.set_title('Histogram of 1-day Change $\mu=' + str(mu) + '$, $\sigma=' + str(sigma) + '$')
plt.show()
label_display = pd.DataFrame()
label_display['close'] = data['close']
label_display['from_yesterday_rate'] = data['close_-1_r']
y1 = data['close_-1_r'].shift(-1)
y1 = y1.apply(lambda x:1 if x>0.0000 else 0)
label_display['y'] = y1
display(label_display.head(7))
def plot_histogram_dv(x,y):
plt.figure(figsize=(15,10))
plt.hist(list(x[y==0]), alpha=0.5, label='Bear')
plt.hist(list(x[y==1]), alpha=0.5, label='Bull')
plt.title("Histogram of '{var_name}' by Forecast Target".format(var_name=x.name))
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend(loc='upper right')
plt.show()
plot_histogram_dv(data['macdh'], y1)
plot_histogram_dv(data['cci'], y1)
plot_histogram_dv(data['adx'], y1)
plot_histogram_dv(data['kdjk'], y1)
Different techniques to represent a price movement can be used to select the one with best results
data.plot(x=data.index, y=['close_20_sma','adx', 'cci'], figsize=(15, 10))
#Labeling the different window frames
##Signaling the difference between a feature datapoint and the previous/next one
def labelwf(dataframe, wf):
for i in wf:
swf = str(i)
dataframe['label' + swf] = \
(dataframe['close'] - dataframe['close'].shift(i))/dataframe['close'].shift(i)
dataframe['label' + swf] = dataframe['label' + swf].apply(lambda x:1 if x>0.0 else 0)
return dataframe
#Negative for looking future datapoints
#Positive for looking backwards
window_frames = [-1, -2, -15, 1, 2, 15]
labeled_data = labelwf(data.copy(), window_frames)
index = list(range(len(data)))
index = index[-250:-15]
label1 = labeled_data['label-1'].values
label1 = label1[-250:-15]
label15 = labeled_data['label-15'].values
label15 = label15[-250:-15]
c1 = copy['close_1_r'].apply(lambda x:0 if x>0.000 else 1)
c15 = copy['close_15_r'].apply(lambda x:0 if x>0.000 else 1)
index = list(range(len(c1)))
index = index[-250:-15]
fig, ax = plt.subplots(figsize=(15, 8), sharey=True)
ax.plot(index, c1[-250:-15], label='1d forward', color='r')
ax.scatter(index, c15[-250:-15], label='15d forward', color='g')
ax.legend()
labeled_data['index'] = list(range(len(data)))
data.plot(y='close', figsize=(15, 8))
for r in labeled_data.iterrows():
if r[1]['label1'] == 1:
plt.axvline(x=r[1]['index'], linewidth=0.3, alpha=0.3, color='g')
else:
plt.axvline(x=r[1]['index'], linewidth=0.3, alpha=0.3, color='r')
plt.show()
#Normalizing the features datapoints
#Accordingly to its window frame
#Each datapoint to the change percentage of timeframe
def percent_change(dataframe, wf):
new = pd.DataFrame()
swf = str(wf)
for feature in dataframe:
if 'label' in str(dataframe[feature].name):
pass
elif 'change_' in str(dataframe[feature].name):
pass
else:
dataframe['change_' + str(dataframe[feature].name)] = \
(dataframe[feature] - dataframe[feature].shift(wf))/dataframe[feature].shift(wf)
new['change_' + str(dataframe[feature].name)] = \
(dataframe[feature] - dataframe[feature].shift(wf))/dataframe[feature].shift(wf)
return dataframe, new
raw_data = data.copy()
data, percent_change_data = percent_change(data, 1)
data = data.drop('change_pdm', 1)
data = data.drop('change_um', 1)
data = data.drop('change_dm', 1)
percent_change_data = percent_change_data.drop('change_pdm', 1)
percent_change_data = percent_change_data.drop('change_um', 1)
percent_change_data = percent_change_data.drop('change_dm', 1)
percent_change_data = percent_change_data.replace([np.inf, -np.inf], np.nan)
percent_change_data = percent_change_data.fillna(method='bfill')
data = data.replace([np.inf, -np.inf], np.nan)
data = data.fillna(method='bfill')
data.plot(x=data.index, y='change_close_20_sma', figsize=(15,10))
data.plot(x=data.index, y=['change_kdjk','change_adx', 'change_close_20_sma'], figsize=(15,10))
display(data.tail())
display(percent_change_data.tail())
plot_histogram_dv(data['change_macdh'], y1)
plot_histogram_dv(data['change_macdh'], c15)
#How abnormal was the change compared to the feature range
def normalized_range(dataframe, wf):
swf = str(wf)
new = pd.DataFrame()
for feature in dataframe:
if 'label' in str(dataframe[feature].name):
pass
elif 'change_' in str(dataframe[feature].name):
pass
elif 'rchange_' in str(dataframe[feature].name):
pass
else:
try:
range = dataframe['change_' + str(dataframe[feature].name)].max() - \
dataframe['change_' + str(dataframe[feature].name)].min()
dataframe['rchange_' + str(dataframe[feature].name)] = \
dataframe['change_' + str(dataframe[feature].name)] / range
new['rchange_' + str(dataframe[feature].name)] = \
dataframe['change_' + str(dataframe[feature].name)] / range
except:
pass
return dataframe, new
change_data = data.copy()
data, normalized_range_data = normalized_range(data, 1)
data.plot(x=data.index, y=['rchange_close_20_sma','rchange_adx', 'rchange_close'], figsize=(15,10))
data = data.replace([np.inf, -np.inf], np.nan)
data = data.fillna(method='bfill')
normalized_range_data = normalized_range_data.replace([np.inf, -np.inf], np.nan)
normalized_range_data = normalized_range_data.fillna(method='bfill')
display(data.tail())
display(normalized_range_data.tail())
plot_histogram_dv(normalized_range_data['rchange_rsi_6'], y1)
plot_histogram_dv(normalized_range_data['rchange_rsi_6'], c15)
#How abnormal was this change percentage ratio in comparison to the others
def normalized_change(dataframe, wf):
swf = str(wf)
new = pd.DataFrame()
for feature in dataframe:
if 'label' in str(dataframe[feature].name):
pass
elif 'change_' in str(dataframe[feature].name):
pass
elif 'rchange_' in str(dataframe[feature].name):
pass
elif 'nchange_' in str(dataframe[feature].name):
pass
else:
try:
std = dataframe['change_' + str(dataframe[feature].name)].std()
mean = dataframe['change_' + str(dataframe[feature].name)].mean()
dataframe['nchange_' + str(dataframe[feature].name)] = \
(dataframe['change_' + str(dataframe[feature].name)] - mean)/std
new['nchange_' + str(dataframe[feature].name)] = \
(dataframe['change_' + str(dataframe[feature].name)] - mean)/std
except:
pass
return dataframe, new
rchange_data = data.copy()
data, normalized_change_data = normalized_change(data, 1)
data = data.replace([np.inf, -np.inf], np.nan)
data = data.fillna(method='bfill')
normalized_change_data = normalized_change_data.replace([np.inf, -np.inf], np.nan)
normalized_change_data = normalized_change_data.fillna(method='bfill')
data.plot(x=data.index, y=['nchange_close_20_sma','nchange_adx', 'nchange_close'], figsize=(15, 10))
display(data.tail())
display(normalized_change_data.tail())
plot_histogram_dv(normalized_change_data['nchange_rsi_6'], y1)
plot_histogram_dv(normalized_change_data['nchange_rsi_6'], c15)
#How abnormal is the position that the datapoint is located at
#We substitute the original feature value for this one
def distance(dataframe):
new = pd.DataFrame()
for feature in dataframe:
if 'label' in str(dataframe[feature].name):
pass
elif 'change_' in str(dataframe[feature].name):
pass
elif 'nchange_' in str(dataframe[feature].name):
pass
elif 'rchange_' in str(dataframe[feature].name):
pass
elif 'distance_' in str(dataframe[feature].name):
pass
else:
std = dataframe[feature].std()
mean = dataframe[feature].mean()
dataframe['distance_' + str(dataframe[feature].name)] = (dataframe[feature] - mean)/std
new['distance_' + str(dataframe[feature].name)] = (dataframe[feature] - mean)/std
return dataframe, new
nchange = data.copy()
data, distance_data = distance(data)
data = data.replace([np.inf, -np.inf], np.nan)
data = data.fillna(method='bfill')
distance_data = distance_data.replace([np.inf, -np.inf], np.nan)
distance_data = distance_data.fillna(method='bfill')
data.plot(x=data.index, y=['distance_close_20_sma','distance_adx', 'close_20_sma'], figsize=(15,10))
display(data.tail())
display(distance_data.tail())
plot_histogram_dv(distance_data['distance_macdh'], y1)
plot_histogram_dv(data['macdh'], y1)
plot_histogram_dv(distance_data['distance_macdh'], c15)
plot_histogram_dv(data['macdh'], c15)
from itertools import combinations
from sklearn.preprocessing import PolynomialFeatures
def add_interactions(df):
# Get feature names
combos = list(combinations(list(df.columns), 2))
colnames = list(df.columns) + ['_'.join(x) for x in combos]
# Find interactions
poly = PolynomialFeatures(interaction_only=True, include_bias=False)
df = poly.fit_transform(df)
df = pd.DataFrame(df)
df.columns = colnames
# Remove interaction terms with all 0 values
noint_indicies = [i for i, x in enumerate(list((df == 0).all())) if x]
df = df.drop(df.columns[noint_indicies], axis=1)
return df
teste = add_interactions(data.copy())
print (teste.head(5))
Implemented in Python3.6, you can check the code on the ./code/ga/ directory, as this jupyter notebook is using a python2.7 kernel.
A genetic algorithm to separate and classify the best features that will be used as an input on the Support Vector Machine Classifier from Tensorflow is implemented as described in the paper below:
from IPython.core.display import Image
Image('1.png')
Image('2.png')
Image('3.png')
import numpy as np
from sklearn.feature_selection import f_classif, mutual_info_classif
y_15 = c15[15:-15]
y_1 = c1[15:-15]
mi = mutual_info_regression(distance_data, y_15, discrete_features='auto')
#print test.columns
mi /= np.max(mi)
result = distance_data.columns[mi > 0.1]
miresult = result
mi = mi[mi > 0.1]
print len(result)
display(result)
mi_df = pd.DataFrame(index=result, columns=['value'])
mi_df['value'] = mi
mi_df.plot(figsize=(15,10))
display(mi_df.head())
print mi_df
print "\n"
ftest, _ = f_regression(distance_data, y_15)
ftest /= np.max(ftest)
_[np.isnan(_)] = 0.0
f = _[~np.isnan(_)]
result = distance_data.columns[f > 0.1]
f = f[f > 0.1]
#print f.max()
#print result.max()
print len(result)
print result
f_df = pd.DataFrame(index=result, columns=['value'])
f_df['value'] = f
f_df.plot(figsize=(15,10))
display(f_df.head())
print f_df
equal = []
for i in miresult.values:
if i in result.values:
equal.append(i)
print "\n"
display(equal)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
data_pca = pd.DataFrame(pca.fit_transform(distance_data))
#display(data_pca.head())
data_pca.plot(figsize=(15,10))
datatest = pca.fit_transform(distance_data)
plt.figure(num=None, figsize=(18, 11), dpi=80, facecolor='w', edgecolor='k')
plt.scatter(datatest[:, 0], datatest[:, 1])
plt.show()
Transforming the data into a Similarity Matrix for comparing the similarity of a certain datapoint with the rest
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
# t-distributed Stochastic Neighbor Embedding (t-SNE) visualization
from sklearn.manifold import TSNE
tsne = TSNE(n_components=2, random_state=0)
x_test_2d = tsne.fit_transform(distance_data)
#y_test = y_15
y_tsne = []
for key, i in np.ndenumerate(y_15):
if i == 0:
if y_1[key[0]] == 0:
y_tsne.append(0)
elif y_1[key[0]] == 1:
y_tsne.append(1)
if i == 1:
if y_1[key[0]] == 0:
y_tsne.append(2)
elif y_1[key[0]] == 1:
y_tsne.append(3)
y_test = np.array(y_tsne)
markers=('s', 'd', 'o', '^', 'v')
color_map = {0:'red', 1:'blue', 2:'lightgreen', 3:'purple'}
plt.figure(figsize=(15,10))
for idx, cl in enumerate(np.unique(y_test)):
plt.scatter(x=x_test_2d[y_test==cl,0], y=x_test_2d[y_test==cl,1], c=color_map[idx], marker=markers[idx], label=cl, alpha=0.5)
plt.xlabel('X in t-SNE')
plt.ylabel('Y in t-SNE')
plt.legend(loc='upper left')
plt.title('t-SNE visualization of test data')
plt.show()
from fbprophet import Prophet
import numpy as np
test = data.copy()
test['ds'] = data.index
test['y'] = np.log(data['close'])
display(test.tail())
m = Prophet()
m.fit(test)
future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
m.plot(forecast)
m.plot_components(forecast)